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ABSTRACT 

Ever growing interest in microRNAs has immensely 
populated the number of resources and research 
papers devoted to the field and, as a result, it 
becomes more and more demanding to find 
miRNA data of interest. To mitigate this problem, 
we created miRNEST database (http://mirnest. 
amu.edu.pl), an integrative microRNAs resource. 
In its updated version, named miRNEST 2.0, the 
database is complemented with our extensive 
miRNA predictions from deep sequencing libraries, 
data from plant degradome analyses, results of pre- 
miRNA classification with HuntMi and miRNA splice 
sites information. We also added download and 
upload options and improved the user interface to 
make it easier to browse through miRNA records. 

INTRODUCTION 

microRNAs (miRNAs) are a class of negative regulators 
of gene expression, widely identified in animals and plants. 
In plants, miRNAs participate in different aspects of 
growth and developmental processes, including lateral 
root formation or transition from juvenile to adult vege- 
tative phase (1). They are also key players in response to 
stress conditions, like drought, low temperatures or 
nitrogen deficiency (2). Animal miRNAs are believed to 
regulate more than half of protein-coding genes and, like 
in plants, are implicated in a number of biological 
processes (3). Notably, multiple miRNAs have been 
associated with diseases, like cancers or rheumatoid 
arthritis (4). 

The fact that miRNAs are key regulators of molecular 
processes in a cell and that they could find multiple 
applications in biotechnology, molecular biology or 
medicine, motivated extensive development of methods 
for their identification and study. The growing number 
of miRNA studies allowed better understanding of their 
biology and, consequently, led to accumulation of 
miRNA databases. However, many of them are limited 



to species of high interest, selected taxa or miRNAs 
involved in some specific processes. For instance, 
miRNeye (5) collects data about miRNA expression in 
mouse eye, whereas GrapeMiRNA stores sequences 
from V. vinifera (6). miRBase (7), on the other hand, 
although accommodates data from a wide range of 
species, contains only already published results. As a 
result, a single universal repository is required so that 
there was no necessity to browse through a number of 
dispersed data sets to collect information related to 
specific species or miRNA type. 

Previously, we took up this challenge and we developed 
miRNEST, a comprehensive online resource for plant, 
animal and virus miRNAs. Using a comparative 
approach, we identified 10004 miRNA candidates in 221 
animal and 199 plant species. As our goal was not only to 
identify new miRNAs but also to develop a resource that 
would integrate miRNA data scattered across literature 
and databases, we also incorporated miRNA sequences 
from three other databases and two publications. 
Additionally, based on availability, we used data from 
12 resources providing further annotation for miRNAs 
from selected species. Here we present miRNEST 2.0, an 
updated version of the database. In addition to 39 122 
miRNAs from miRNEST 1.0 (10004 from our EST 
analysis and 29118 from other resources), we predicted 
18 043 pre-miRNAs using small RNA deep sequencing 
data from 21 species. For miRNAs in 10 species, we 
provided targets inferred from degradome libraries. We 
also added miRNA splice sites information, HuntMi (8) 
predictions and some database functionalities, including 
download option. Taken together, miRNEST 2.0 is a 
large and comprehensive resource of miRNA data that 
bears distinct improvements over its previous version. 



MATERIALS AND METHODS 

miRNA prediction from sRNA deep sequencing data 

For miRNA predictions we downloaded, from GEO 
database (9), 171 small RNA deep sequencing libraries 
from 8 plant and 13 animal species (Figure 1, 
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Figure 1. The pipeline used for large-scale miRNA discovery from sRNA deep-sequencing data. 



Supplementary Table SI). Reads 19-26 bases long were 
kept and we mapped them to corresponding plant or 
animal genomes using Bowtie (10). In the mapping step, 
no mismatches were allowed and reads mapping to >20 
distinct locations were discarded. Mapped reads that were 
19-22-nt long and with count > 5 were considered 'poten- 
tial mature miRNAs'. We retrieved their sequences from 
genomes along with flanking genomic sequences of 150 
bases in animals and 250 bases in plants, and then we 
predicted secondary structures using hybrid-ss-min from 
UNAFold package (11). We kept only sequences with 
miRNA-like secondary structures: a stem loop-structure 
with 'potential mature miRNA' located in a single 
hairpin arm; no more than six mismatches and three 
bulges (animals) or five mismatches and two bulges 
(plants) between mature miRNA and the opposite 
hairpin arm. If a stem-loop structure was surrounded by 
additional nucleotides, the flanking regions were cutoff. 
Subsequently, we checked similarity to non-coding 
RNAs from RFAM (12) and proteins from UniProt 



(UniProtKB/Swiss-Prot protein data set) (13) using 
BLAST (14). Sequences showing similarity to RFAM 
non-miRNAs with E<le-10 or UniProt proteins with 
E < le-20 were discarded. After that we searched for 
low-complexity regions using Dustmasker (14); sequences 
bearing >60% of low-complexity regions were removed. 
Finally, we made sure that there is a miRNA-like profile 
of reads mapped to the hairpin. To achieve this we kept 
only the hairpins where (i) 'potential mature miRNA' cor- 
responded to the most abundant read in at least one 
library, (ii) abundance of 'potential mature miRNA' 
constituted minimal 20% of total read counts in at least 
one library and (hi) the total count of reads starting at 5' 
position of 'potential mature miRNA' was the maximal 
one in at least one library. 

Newly identified miRNA candidates were checked 
against intronic sequences in corresponding species and se- 
quences that fully overlapped with introns, with 'potential 
mature miRNA' located no more than four bases away 
from 5' or 3' intron end became mirtron candidates. 
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We supplemented these candidates with already published 
predictions in mouse and human (15). 

Degradome analysis 

We downloaded 18 degradome libraries from GEO (9) 
that corresponded to 10 plant species: Arabidopsis 
thaliana, Glycine max, Hordeum vidgare, Malus domestica, 
Medicago truncatula, Physcomitrella patens, Prunus 
persica, Solanum lycopersicum, Triticum aestivum and 
V. vinifera (Supplementary Table S2). Transcript 
sequences (cDNAs) were downloaded from Ensembl 
Plants (16), and mature miRNA sequences were retrieved 
from miRNEST (17). Using PAREsnip (18), we searched 
for miRNA targets evidenced by degradome reads. 
We adjusted the program settings to look only for 
category 0, 1 and 2 targets, i.e. only high confidence 
candidates. For obtained candidates, we prepared 
degradome reads alignment files and corresponding plots 
for graphical representation of read mapping. 

HuntMi predictions 

HuntMi (8) is a machine learning tool for discrimination 
between true and false pre-miRNAs in plants, animals 
and viruses based on properties of pre-miRNA sequence 
and its secondary structure. We used this tool with 
default settings to better annotate pre-miRNAs stored 
in miRNEST. For animal, plant and virus sequences, 
different taxon-specific classifiers were used. 

miRNA splice sites prediction 

To infer miRNA splicing events from EST sequences, we 
applied a strategy previously used in ERISdb (19). In the 
first step, pre-miRNAs were searched against dbEST (20) 
using Megablast (14). It was required that the identity was 
97% or higher and that the EST sequence contained at 
least 90% of known pre-miRNA sequence. The selected 
ESTs were subsequently mapped to the corresponding 
genome using Splign (21) with default settings. The align- 
ments were finally checked manually to remove cases where 
ESTs came from the antisense strand and to improve the 
alignment in every case when splice site was broken because 
of imperfection of EST alignment software. Additionally, 
gene structures for 45 plant miRNAs were downloaded 
from ERISdb (19). We also obtained gene structures 
from RACE experiments in Populus trichocarpa (22), and 
RNA-Seq-evidenced splice sites in V. vinifera (23). 

RESULTS 

In current version, miRNEST has been extensively 
enlarged by results of small RNA deep sequencing 
analyses. First of all, we predicted 18 043 pre-miRNAs in 
21 plant and animal species, and because miRNAs were 
often found independently in different sRNA libraries, 
this corresponds to as many as 36468 new records in the 
database. In the search pipeline, we applied a number of 
strict criteria from the literature (17,24,25). In all, 38.1 % of 
new sequences overlap with miRNAs already stored 
in miRNEST 1.0, thus providing experimental support 
for them (Supplementary Table S3). Moreover, as the 



database encompasses multiple libraries per species, it 
is possible to investigate isomiRs and changes in small 
RNA counts in different tissues and conditions. 
Although a similar functionality is available at miRBase 
(7), the analyzed species and selected deep sequencing 
libraries overlap only partly. Furthermore, for all 
miRNAs stored in miRNEST, including new predictions, 
we run classification analysis using HuntMi, which helped 
in much better annotation. Altogether, 91.16% of 
miRNEST sequences were considered true miRNAs, 
including miRNEST EST predictions (77.85%), 
miRNEST deep sequencing predictions (71.9%) and 
miRNAs from external databases (96.91%). Relatively 
high fraction of sequences recognized as true miRNAs in 
case of external databases [miRBase (7), PMRD (26), 
microPC (27)] might be due to the fact that this data 
set largely overlaps with miRNAs used to train HuntMi. 
Another aspect of deep sequencing analysis was identifi- 
cation of degradome-evidenced miRNA targets in 
10 plant species. As we wanted to achieve highest quality 
results, only category 0, 1 and 2 candidates, as returned 
by PAREsnip, were considered. This allowed us to 
identify 2041 miRNA-target associations (Supplementary 
Table S4). 

Splicing in miRNA genes is an underestimated aspect 
of miRNA biology. So far, there is only one repository 
that stores miRNA splice sites information (19). 
We incorporated that data into miRNEST 2.0 and add- 
itionally performed splice site search in several species, 
which allowed us to find 17 miRNAs with introns in 
5 plant species. We also complemented that data 
with miRNA gene structures from the literature 
(P. trichocarpa, V. vinifera). 

CONCLUSIONS 

The current version of the miRNEST database contains 
twice as many miRNA records as the version 1.0. Thanks 
to the small RNA deep sequencing data analysis, almost 
40% of previously predicted miRNAs is now validated by 
the experimental data. Moreover, target predictions for 
miRNAs from 10 species are supported by degradome 
data. miRNEST 2.0 has also an updated user interface 
and works faster than its predecessor. We added both 
bulk data download and download available from 
'Browse' page (for user-selected miRNAs). As we want 
miRNEST to grow and be a truly comprehensive 
miRNA resource, we also enabled upload option for 
miRNA-associated data. 

AVAILABILITY AND REQUIREMENTS 

miRNEST is freely available at http://mirnest.amu.edu.pl. 
Its previous version, miRNEST 1.0, can still be accessed 
at http://lemur.amu.edu.p1/share/php/mirnest_l.0. The 
database was constructed using Hypertext Markup 
Language (HTML), Cascading Style Sheets (CSS), PHP 
5.2.11 (http://www.php.net/) and MySQL 4.0.31 (http:// 
www.mysql.com/). pre-miRNA secondary structures are 
drawn using Java lightweight applet VARNA (28), 
which requires installation of Java plugin. 
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SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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